在Pandas DataFrames中选择行和列使用iloc，loc和ix

您所在的位置：网站首页 › pandas 选择一行 › 在Pandas DataFrames中选择行和列使用iloc，loc和ix

在Pandas DataFrames中选择行和列使用iloc，loc和ix

2023-04-13 16:36| 来源: 网络整理| 查看: 265

在Pandas中有三种主要的选择来实现选择和索引活动，这可能会造成混淆。这篇文章介绍的三个选择案例和方法是：

通过行号选择数据（.iloc）通过标签或条件语句（.loc）选择数据采用混合方法（.ix）进行选择（Pandas 0.20.1中已弃用）资料设定

这篇受其他教程启发的博客文章介绍了这些操作的选择活动。本教程适用于一般的数据科学情况，通常我会发现自己：

数据框中的每一行代表一个数据样本。每列都是一个变量，通常被命名。我很少选择没有名称的列。我需要经常从数据框中选择相关的行以进行建模和可视化活动。

对于刚起步的人，Python的Pandas库提供了高性能，易于使用的数据结构和数据分析工具，用于处理“系列”和“数据框”中的表格数据。在使您的数据处理变得更加轻松方面，它非常出色。我之前已经写过关于使用Pandas进行数据分组和汇总的文章。

使用熊猫使用行和列的两个主要参数，实现iloc和loc索引本博客文章中讨论的iloc和loc方法的摘要。iloc和loc是用于从Pandas数据框中检索数据的操作。

Pandas数据框的选择和索引方法

对于这些探索，我们将需要一些样本数据–我从www.briandunning.com下载了uk-500样本数据集。此数据包含虚构的英国字符的人工名称，地址，公司和电话号码。要继续进行操作，您可以在此处下载.csv文件。加载数据如下（此图来自Jupyter笔记本在蟒蛇Python的安装）：

import pandas as pd import random # read the data from the downloaded CSV file. data = pd.read_csv('https://s3-eu-west-1.amazonaws.com/shanebucket/downloads/uk-500.csv') # set a numeric id for use as an index for examples.设置数字ID用作示例索引。 data['id'] = [random.randint(0,1000) for x in range(data.shape[0])] data.head(5)

熊猫iloc loc和ix索引示例的示例数据。从CSV文件加载的示例数据。

1.使用“ iloc”选择Pandas数据

Pandas数据框的iloc索引器用于基于整数位置的索引/按位置选择。

iloc索引器的语法是data.iloc [，]，对于R用户来说，这肯定会引起混乱。Pandas中的“ iloc”用于按编号选择行和列，顺序是它们出现在数据框中。您可以想象每行的行号从0到总行数（data.shape [0]），而iloc []允许基于这些数字进行选择。列也是如此（范围从0到data.shape [1]）

iloc有两个“参数” –行选择器和列选择器。例如：

# Single selections using iloc and DataFrame使用iloc和DataFrame进行单个选择 # Rows:行 data.iloc[0] # first row of data frame (Aleshia Tomkiewicz) - Note a Series data type output.数据帧的第一行（Aleshia Tomkiewicz）-注意Series数据类型的输出 data.iloc[1] # second row of data frame (Evan Zigomalas)数据帧的第二行（Evan Zigomalas） data.iloc[-1] # last row of data frame (Mi Richan) 数据帧＃最后一行（祢日婵） # Columns:列 data.iloc[:,0] # first column of data frame (first_name) 数据帧的第一列（first_name） data.iloc[:,1] # second column of data frame (last_name) 数据帧的第二列（last_name） data.iloc[:,-1] # last column of data frame (id) 数据帧的最后一列（id）

可以使用.iloc索引器一起选择多个列和行。

# Multiple row and column selections using iloc and DataFrame 使用iloc和DataFrame选择多个行和列 data.iloc[0:5] # first five rows of dataframe 数据帧的前五行 data.iloc[:, 0:2] # first two columns of data frame with all rows 数据帧的前两列，所有行 data.iloc[[0,3,6,24], [0,5,6]] # 1st, 4th, 7th, 25th row + 1st 6th 7th columns.第一，第四，第七，第25行+第一第六第七列。 data.iloc[0:5, 5:8] # first 5 rows and 5th, 6th, 7th columns of data frame (county -> phone1). 前5行和第五，第六，数据帧的第七列（county- > PHONE1）。

以这种方式使用iloc时，要记住两个陷阱：

请注意，.iloc在选择一行时返回Pandas Series，在选择多行或选择完整列时返回Pandas DataFrame。为了解决这个问题，如果需要DataFrame输出，则传递一个单值列表。

使用.loc或.iloc时，可以通过将列表或单个值传递给选择器来控制输出格式。当以这种方式选择多列或多行时，请记住在选择中，例如[1：5]，所选行/列将从第一个数字到一个减去第二个数字。例如[1：5]将变为1,2,3,4。[x，y]从x变为y-1。

实际上，除非我想要数据帧的第一行（.iloc [0]）或最后一行（.iloc [-1]），否则我很少使用iloc索引器。

2.使用“ loc”选择Pandas数据

Pandas loc索引器可与DataFrames一起用于两种不同的用例：

a。）通过标签/索引选择行 b。）选择具有布尔/条件查找的行

位置索引器的使用语法与iloc相同：data.loc [，]。

2a。使用.loc的基于标签/基于索引的索引

使用loc方法进行的选择基于数据帧的索引（如果有）。使用 df.set_index（）在DataFrame上设置索引的情况下，.loc方法将根据任何行的索引值直接进行选择。例如，将测试数据框的索引设置为人员“ last_name”：

data.set_index("last_name", inplace=True) data.head()

使用.set_index（）进行索引设置的Pandas Dataframe，用于.loc []解释。

姓氏设置为样本数据帧上的索引集现在有了索引集，我们可以使用.loc []直接选择行以使用不同的“ last_name”值-单个或多个。例如：

熊猫使用.loc在数据框中基于标签的查找使用.loc带有Pandas的索引选择来选择单行或多行。请注意，第一个示例返回一个系列，第二个示例返回一个DataFrame。您可以通过将单元素列表传递给.loc操作来实现单列DataFrame。

使用列名选择带有.loc的列。在我的大多数数据工作中，通常我都命名列，并使用这些命名选择。

在pandas .loc中按名称选择列使用.loc索引器时，使用字符串列表或“：”切片按名称引用列。

您可以选择索引标签的范围–选择 data.loc ['Bruch'：'Julio'] 将返回数据框中“ Bruch”和“ Julio”的索引条目之间的所有行。。现在，以下示例应该有意义：

# Select rows with index values 'Andrade' and 'Veness', with all columns between 'city' and 'email' 选择索引值为“ Andrade”和“ Veness”的行，所有列都在“ city”和“ email”之间 data.loc[['Andrade', 'Veness'], 'city':'email'] # Select same rows, with just 'first_name', 'address' and 'city' columns 选择相同的行，仅包含“ first_name”，“ address”和“ city”列 data.loc['Andrade':'Veness', ['first_name', 'address', 'city']] # Change the index to be based on the 'id' column 将索引更改为基于“ id”列 data.set_index('id', inplace=True) # select the row with 'id' = 487 选择'id'= 487的行 data.loc[487]

请注意，在最后一个示例中，data.loc [487] （索引值为487的行）不等于data.iloc [487] （数据中的第487行）。DataFrame的索引可以不按数字顺序和/或字符串或多值。

2b。使用.loc的布尔/逻辑索引

使用data.loc []与布尔数组进行条件选择是我与Pandas DataFrames一起使用的最常见方法。使用布尔索引或逻辑选择，您可以将数组或True / False值系列传递给.loc索引器，以选择Series具有True值的行。

在大多数使用情况下，您将根据数据集中不同列的值进行选择。

例如，语句data ['first_name'] =='Antonio']生成一个Pandas系列，其“数据” DataFrame中的每一行都具有True / False值，其中first_name所在的行具有“ True”值是“安东尼奥”。这些类型的布尔数组可以直接传递给.loc索引器，如下所示：

.loc索引器可以接受布尔数组来选择行使用布尔“真/假”系列选择Pandas数据框中的行-选择所有名称为“ Antonio”的行。

和以前一样，可以将第二个参数传递给.loc以从数据帧中选择特定的列。同样，列是通过loc indexer的名称来引用的，并且可以是单个字符串，列列表或切片“：”操作。

使用.loc的多列选择示例通过将列名传递给.loc []的第二个参数，可以选择带有loc的多列请注意，在选择列时，如果仅选择一列，则.loc运算符将返回一个Series。对于单列DataFrame，请使用一个元素列表来保留DataFrame格式，例如：

.loc根据选择返回Series或DataFrames 如果将单个列的选择作为字符串进行选择，则将从.loc返回一系列。传递列表以返回DataFrame。

为了清楚起见，请确保您了解以下.loc选择的其他示例：

# Select rows with first name Antonio, # and all columns between 'city' and 'email' 选择名字为Antonio的行，以及＃在'city'和'email'之间的所有列 data.loc[data['first_name'] == 'Antonio', 'city':'email'] # Select rows where the email column ends with 'hotmail.com', include all columns 选择电子邮件列以'hotmail.com'结尾的行，包括所有列 data.loc[data['email'].str.endswith("hotmail.com")] # Select rows with last_name equal to some values, all columns 选择last_name等于某些值的行，所有列 data.loc[data['first_name'].isin(['France', 'Tyisha', 'Eric'])] # Select rows with first name Antonio AND hotmail email addresses 选择名字为Antonio和hotmail电子邮件地址的行 data.loc[data['email'].str.endswith("gmail.com") & (data['first_name'] == 'Antonio')] # select rows with id column between 100 and 200, and just return 'postal' and 'web' columns 选择id列在100到200之间的行，并仅返回“ postal”和“ web”列 data.loc[(data['id'] > 100) & (data['id'] 2000, "first_name"] = "John" # Change the first name of all rows with an ID greater than 2000 to "John" 将ID大于2000的所有行的名字更改为“ John” data.loc[data['id'] > 2000, "first_name"] = "John"

【本文地址】

在Pandas DataFrames中选择行和列使用iloc，loc和ix

在Pandas DataFrames中选择行和列使用iloc，loc和ix

今日新闻

推荐新闻